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(54) Method for automatic speech recognition of arbitrary spoken words 



(57) A method and system for allowing an automatic 
speech recognition (ASR) system to recognize arbitrary 
words, by accessing information in a supplemental data 
base. A supplemental data base is accessed to retrieve 
supplementary textual information, such as a proper 



name. A text-to-speech means is used to generate a 
phoneme transcription of the text retrieved from the sup- 
plemental data base, so that the transcription may be 
used as a speaker independent template by the ASR 
system for recognizing a spoken word. 
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Description 

Field of the Invention 

The present invention relates to automatic speech 
recognition. More particularly, the invention relates to a 
method of using supplemental information retrieved 
from a data base in conjunction with a telephone net- 
work to assist an automatic speech recognition (ASR) 
system in recognizing a word spoken by the user. 

Background of the Invention 

For many applications, it is advantageous to. use 
computers to automate repetitive tasks so that the tasks 
may be performed more quickly and efficiently. Speech 
recognition, a type of voice technology, allows people to 
interact with computers using spoken words. Speech 
recognition is challenging, however because of the in- 
herent variations of speech among different people. 

One application of speech recognition is in a tele- 
phone network. Using automatic speech recognition 
(ASR) systems, people can communicate over the tel- 
ephone so that simple tasks can be performed without 
operator intervention. For oxamplc, spooch recognition 
may be used for dialing so that the telephone user need 
not remember, look up, or ask for a telephone number. 
The ability to use speech instead of physical manipula- 
tion of a user interface has kept the demand for ASR 
technology high as advances in telecommunications 
continue. Generally, there are two types of ASR systems 
used in telecommunications: speaker dependent and 
speaker independent. 

One common implementation of a speaker depend- 
ent automatic speech recognition system uses a com- 
puter which is "trained* by a particular speaker to re- 
spond to the speaker's speech patterns. The training 
process comprises the vocalization of a sound (i.e., a 
word) to generate an analog speech input, conversion 
of the speech input into signal data, generation of a tem- 
plate representing the sound and storage of the indexed 
template to appropriate specific response data, such as 
a computer instruction to instigate an action. 

During real time operations, the words spoken by 
the training speaker are digitized and compared to the 
set of speaker dependent templates in the ASR system 
so that a match between the spoken words and a tem- 
plate can trigger a particular response by the computer. 
Speaker dependent ASR systems are used primarily 
where the training process can be justified, e.g., where 
the same individuals access the system on many occa- 
sions. 

For applications in which no individual training can 
be justified, a speaker independent ASR must be used. 
A common implementation of a speaker independent 
ASR system uses a computer to store a composite tem- 
plate or cluster of templates that represent a word spo- 
ken by a number of different people. The templates are 



derived from numerous data samples (i.e., words spo- 
ken by a plurality of speakers) which represent a wide 
range of pronunciations and variations in speech char- 
acteristics. Speaker independent speech recognition 

s systems can interact with a wide variety of people with- 
out speaker-specific training. 

Telephone applications which use speaker inde- 
pendent ASR to recognize spoken numbers are known 
in the art. These applications are especially useful when 

10 the vocabulary of the speaker is limited to a few menu 
commands and/or numbers (e.g. 0-9). It is very difficult, 
however, to recognize spoken letters (A-Z) over the tel- 
ephone network. Indeed, due to various types of noise 
and bandwidth limitations, coupled with the wide variety 

75 of speech patterns among individual speakers, the tel- 
ephone environment makes all ASR applications 
(speaker dependent and speaker independent) prone 
to error. 

Nonetheless, a sought-after commercial applica- 
20 tion of ASR is automating tasks associated with com- 
mercial transactions, e.g., credit card transactions, 
made via the telephone network. For example, if a cus- 
tomer wishes to purchase goods or services over the 
telephone, ASR could be used to gather pertinent infor- 
ms mat ion and to authorize the transaction quickly and ef- 
ficiently with minimal operator intervention. 

Telephone purchases of goods or services made by 
using a credit/debit card may require the customer to 
provide his or her name (or other predetermined infor- 
30 mat ion) as a step in the transaction. Unfortunately, it the 
recognition of arbitrary spoken word information (such 
as the customer's name) that has inhibited the use of 
ASR technology by those entities which need it most, 
such as high volume businesses. 
35 This is because high volume businesses not only 
require an ASR system to recognize arbitrary spoken 
words (e.g. proper names), but also require ubiquitous 
access to the ASR system. For example, to serve the 
needs of a high volume business with potential custom- 
40 ers across an entire country using conventional ASR 
systems, speaker independent templates correspond- 
ing to the name of every person in the entire country 
would have to be created and stored using techniques 
described above. The present state of the art ASR sys- 
45 terns, however, are incapable of matching a spoken 
name with one of the millions of possible names corre- 
sponding to the stored templates. 

Therefore, there is a need in the art for improve- 
ments in ASR systems which will enable such systems 
so to automatically recognize spoken words with increased 
capability. 

Summary of the Invention 

55 This need is addressed and a technical advance is 
achieved in the art by a method and system for utilizing 
supplemental data to enhance the capability of an ASR 
system so that the system can quickly and accurately 
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recognize arbitrary spoken words, such as proper 
names. 

It is common practice in telephony applications of 
an ASR system to solicit a telephone number from a call- 
er. The telephone number of the caller may be obtained 
by speech or other known methods, such as caller ID or 
touch tone entry. Therefore, in accordance with one ex- 
emplary embodiment of the method and system of the 
present invention, the telephone number of the caller 
serves as an index for retrieving text (i.e., the caller's 
name) from a data base. This text is then used to limit 
or specify the choices available to an ASR system. 

More particularly, the telephone number of the caller 
is used to access a supplemental data base to retrieve 
text associated with the telephone number. In the 
above-mentioned exemplary embodiment, text contain- 
ing the caller's name is retrieved from the supplemental 
data base. The text of the caller's name comprises a 
digitized alphanumeric representation of the caller's 
proper name. A text-to-speech system is used to tran- 
scribe the text of the caller's name to a phoneme tran- 
scription, as is known in the art. The phoneme transcrip- 
tion of the name is stored in the ASR system as a speak- 
er independent template so that speech which conforms 
to tho transcription can be recognized by the ASR sys- 
tem. Retrieval and conversion of supplemental textual 
data to a phoneme transcription allows the ASR system 
to respond immediately to spoken words which corre- 
spond to the transcription of the data retrieved in the 
absence of speaker-specific training. 

During real time applications of the above exempla- 
ry embodiment, a telephone call is received by a service 
provider who maintains an ASR system in accordance 
with the present invention. The caller, who wishes to 
make a credit card purchase of a good or service, is 
prompted for a telephone number and name. Based up- 
on the telephone number provided by the caller, a sup- 
plemental data base (e.g. an electronic telephone direc- 
tory) is accessed to retrieve the text of a name associ- 
ated with the telephone number. A phoneme transcrip- 
tion of the text of the name retrieved from the data base 
is created by a text-to-speech system. The phoneme 
transcription is then stored as a speaker independent 
template to be used by the ASR system to recognize the 
name, as spoken by the caller. 

If the name spoken by the caller is not recognized 
by the ASR system due to unusual pronunciation of 
names, poor telephone transmission quality, callers 
whose voices are difficult to recognize, etc., the caller is 
prompted to provide a spelling of their name, letter by 
letter. The text-to-speech system may be used to create 
a phoneme transcription of the spelling of the written 
name, as retrieved from the data base. The spelling of 
the name retrieved from the data base is also stored as 
a speaker independent template in the ASR system so 
that the system can attempt to recognize the spelling 
ofthe caller's name (as spoken by the caller). If there is 
no match of the spoken utterance of the caller's name 



and the phoneme transcription (based upon established 
speech recognition algorithms), the call is routed to a 
human attendant. 

s Brief Description of the Drawings 

FIG. 1 is a simplified block diagram of telephone 
and user interactive systems associated with an 
ASR system in accordance with an exemplary em- 
10 bodiment of the present invention; 

FIG. 2 is a simplified block diagram of the ASR sys- 
tem of FIG. 1 which is used for providing speech 
recognition and verification in accordance with an 
exemplary embodiment of the present invention; 
is FIG. 3 is a flow diagram of an exemplary method in 
accordance with the present invention. 

Detailed Description 

FIG, 1 shows two systems 100A and 100B in ac- 
cordance with an exemplary embodiment of the present 
invention. Voice telephone system 100A includes calling 
station 102, telecommunications lines 103A and 103B 
and switching service point (SSP) 105 which is located 
in public switched telephone network 107. For clarity, a 
single switching service point is shown, but an opera- 
tional public switched telephone network comprises an 
interconnected network of SSPs. Telephone line inter- 
face unit 108 protects the ASR system 110 equipment 
from network malfunctions, such as power surges and 
digitizes incoming speech from calling station 102, if the 
originating speech is not already digitized, before deliv- 
ery to the system. 

SSP 105 is a distributed control, local digital switch, 
such as a 5ESSGD switch as described in the AT&T Tech- 
nical Journal , Vol. 64, No. 6, July-August 1 985, pages 
1303-1564, the November 1981, Bell Laboratories 
Record , page 258, and the December, 1981, Bell Lab- 
oratories Record , page 290 and manufactured by AT&T. 
Alternatively, SSP 105 may be a distributed control, an- 
alog or digital switch, such as an ISDN switching system 
as discbsed in U. S. Patent No. 4,592,048, issued to M. 
W. Beckner et at., on May 27, 1986. In FIG. 1, SSP 105 
is connected via customer identified lines 103 to calling 
station 1 02 and is also in communication with host com- 
puter 1 24 via line 1 21 B as described below. 

Also shown in FIG. 1 is user interactive system 
100B including microphone 104 and microphone inter- 
face unit 109. Microphone 104 may be disposed in a 
kiosk or automated teller machine (not shown) main- 
tained by a service provider as a link between the ASR 
system and the consumer, as is known in the art. 

Incoming speech is transformed into electrical sig- 
nals by microphone 104 and delivered to microphone 
interface unit 1 09 via communications link 106A. Micro- 
phone interface unit 109 converts incoming speech sig- 
nals into digital data before delivery to ASR system 110 
via communications link 106B. 
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ASR system 110 (described in detail in FIG. 2 be- 
low) is in communication with host computer 1 24 via da- 
ta bus 1 25. Host computer 1 24 includes central process- 
ing unit (CPU) 126 for controlling the overall operation 
of the computer, random access memory (RAM) 128 for 
temporary data storage, read only memory (ROM) 1 30 
for permanent data storage and non-volatile data base 
1 34 for storing control programs associated with host 
computer 124. CPU 126 communicates with RAM 128 
and ROM 130 via data buses 132. Similarly, CPU 126 
communicates with non-volatile data base 1 34 via data 
bus 133. Input/output (I/O) interface 136 is connected 
to host computer 124 via data bus 135 to facilitate the 
flow of data from local area network (LAN) 1 38 which is 
in communication with I/O interface 136 via data link 
139, supplementary data base 140 which is in commu- 
nication with I/O interface 1 36 via data link 1 41 and data 
service network 1 42 which transmits digital data to host 
computer 124 via telecommunications line 121 A, SSP 
105 and data link 121B, as described below. 

FIG. 2 shows a simplified block diagram of an ex- 
emplary embodiment of ASR system 110 as shown in 
FIG. 1. ASR system 110, which is capable of either 
speaker independent or speaker dependent speech 
recognition includes CPU 202 for controlling the overall 
operation of the system. CPU 202 has a plurality of data 
buses represented generally by reference numeral 203. 
Also shown is random access memory (RAM) 204, read 
only memory (ROM) 206, speech generator unit 218 for 
issuing greetings and prompts to a caller and text-to- 
speech (TTS) system 219 (which communicates with 
CPU 202 and RAM 204) for transcribing written text into 
a phoneme transcription, as is known in the art. 

RAM 204 is connected to CPU 202 by bus 203 and 
provides temporary storage of speech data, such as 
words spoken by a caller at calling station 102 or micro- 
phone station 104, speaker dependent templates 214 
and speaker independent templates 216. ROM 206, al- 
so connected to CPU 202 by data bus 203, provides per- 
manent storage of speech recognition and verification 
data including speech recognition algorithm 208 and 
models of phonemes 210. In this exemplary embodi- 
ment, a phoneme based speech recognition algorithm 
208 is utilized, although many other useful approaches 
to speech recognition are known in the art. 

A phoneme is a term of art which refers to one of a 
set of smallest units of speech that can be combined 
with other such units to form larger speech segments, 
e.g., morphemes. For example, the phonetic segments 
of the spoken word ■operator" may be represented by a 
combination of phonemes such as "aa", "p", "axr", "ey", 
"dx" and "axr - . Models of phonemes 210 are compiled 
using speech recognition class data which is derived 
from the utterances of a sample of speakers in a prior 
off-line process. During the process words selected so 
as to represent all phonemes of the language are spo- 
ken by a large number of training speakers (e.g., 1000). 
The utterances are processed by a trained individual 



who generates a written text of the content of the utter- 
ances. 

The written text of the word is then received by a 
text-to-speech unit, such as TTS system 219, so that it 
5 may create a phoneme transcription of the written text 
using rules of text-to-speech conversion, as is known in 
the art. The phoneme transcription of the written text is 
then compared with the phonemes derived from the op- 
eration of the speech recognition algorithm 208, which 
10 compares the utterances with the models of phonemes 
210. The models of phonemes 210 are adjusted during 
this "model training* process until an adequate match is 
obtained between the phoneme derived from the text- 
to-speech transcription of the utterances and the pho- 
15 nemes recognized by the speech recognition algorithm 
208, using adjustment techniques as is known in the art. 

Models of phonemes 210 are used in conjunction 
with speech recognition algorithm 208 during the recog- 
nition process. More particularly, speech recognition al- 
gorithm 208 matches a spoken word with established 
phoneme models. If the speech recognition algorithm 
determines that there is a match (i.e. if the spoken ut- 
terance statistically matches the phoneme models in ac- 
cordance with predefined parameters), a list of pho- 
nemes is generated. 

Since the models of phonemes 21 0 represent a dis- 
tribution of characteristics of a spoken word across a 
large population of speakers, the models can be used 
for a ubiquitous access to an ASR system which serves 
the same speaker population represented by the train- 
ing speakers (i.e. native-born Americans, Spanish- 
speaking populations, etc.). 

Speaker independent template 216 is a list of pho- 
nemes which represent an expected utterance or 
phrase. A speaker independent template 216 is created 
by processing written text through TTS system 219 to 
generate a list of phonemes which exemplify the expect- 
ed pronunciations of the written word or phrase. In gen- 
eral, multiple templates are stored in RAM memory 204 
to be available to speech recognition algorithm 208. The 
task of algorithm 208 is to choose which template most 
closely matches the phonemes in a spoken utterance. 

Speaker dependent templates 214 are generated 
by having a speaker provide an utterance of a word or 
phrase, and processing the utterance using speech rec- 
ognition algorithm 208 and models of phonemes 210 to 
produce a list of phonemes which comprises the pho- 
nemes recognized by the algorithm. This list of pho- 
nemes is speaker dependent template 214 for that par- 
ticular utterance. 

During real time speech recognition operations, an 
utterance is processed by speech recognition algorithm 
208 using models of phonemes 210 such that a list of 
phonemes is generated. This list of phonemes is 
matched against the list provided by speaker independ- 
ent templates 216 and speaker dependent templates 
214, using techniques as known in the art. Speech rec- 
ognition algorithm 208 reports results of the match. 
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FIG. 3 is a flow diagram describing the actions taken 
at ASR system 110 when the system is operating in a 
* speaker independent mode an exemplary embodiment 
of the method of the present invention. 

As an example of a commercial application of the 
present invention, assume that a customer is calling 
from a home telephone (calling station 1 02) and wishes 
to make a credit card purchase of a service offered by 
a service provider who uses ASR system 110 and host 
computer 124. In this example, the customer has not 
previously purchased the service so ASR system 110 is 
not trained to recognize the particular speech patterns 
of the customer (i.e., there are no speaker dependent 
templates 214 established for this customer). In order 
for the credit card transaction to be authorized, however, 
ASR system 110 must receive and recognize the cus- 
tomer's name. 

The example begins when in step 300 ASR system 
110 receives a customer originated incoming call routed 
via telecommunications line 103A, 103B and SSP 105 
of public switched telephone network 107. 

Alternatively, the customer could place a service or- 
der from a kiosk which houses a user interactive system 
including microphone station 104. If so, an incoming 
"call" is received by ASR system 110 when a customer 
input (e.g., speech) is detected at microphone station 
1 04 and delivered to the system via communications link 
106B. 

' In both the telephone system and the user interac- 
tive system, the incoming call is processed by an inter- 
face unit (i.e., telephone line interface unit 108 and mi- 
crophone interface unit 109, respectively) to ensure that 
all input received in ASR system 110 is in a common 
digital format. 

As shown in step 302, speech generator unit 21 8 of 
ASR system 1 1 0 issues a greeting and prompts the cus- 
tomer for an input such as a predetermined index (e.g., 
a home telephone number), a name associated with the 
index and possibly, a spelling of the name. Alternatively, 
the system could defer prompting the caller for a spelling 
of name until it is needed in the process, as described 
below. 

The process continues to determination step 304 
where it is determined whether the requested input was 
received. If the result of step 304 is a "NO" decision, the 
process continues to step 306 where the call is routed 
to a live attendant and the process terminates at end 
step 308. 

If the result in step 304 is a "YES" decision, the proc- 
ess continues to step 318 where the customer's utter- 
ance of the index, name and spelling of the name are 
stored in RAM 204 of ASR system 110. In the above 
example, the customer provided the index by speaking. 
When the index is the customer's home telephone 
number, it may be retrieved by other known techniques, 
such as caller ID or touch tone entry. 

The process continues to step 310 where there is 
an attempt to recognize the caller's index using speech 



recognition algorithm 208 and model phonemes (for dig- 
its) 210. The index is used to retrieve information from 
a supplemental data base, as described below. If there 
is uncertainty about certain digits of the index, the sys- 
s tern may be programmed to recognize multiple possibil- 
ities. In determination step 314, it is determined whether 
the customer's index was recognized in the preceding 
step. If the result of determination step is a "NO" deci- 
sion, the process continues to step 306 where the call 

10 js routed to a live attendant and the process terminates 
at end step 308. 

If the result in determination step 314 is a "YES" 
decision, the process continues to step 316 where CPU 
202 of ASR system 110 makes a request to host corn- 
's puter 1 24 for supplemental data. In the above example, 
the supplemental data desired is a digitally stored rep- 
resentation of the customer's name (e.g., the name as- 
sociated with the home telephone number received by 
the customer) such as in ASCII text format. 

20 When the request for supplemental data is received 
at CPU 126 of host computer 124, CPU 126 determines 
which supplemental data base must be accessed, by 
using the index (e.g., the telephone number provided by 
the customer), to retrieve a digital representation (e.g., 

zs ASCII text format) of the customer's name. CPU 126 
makes the determination based on instructions received 
from non-volatile data base 134. 

For example, if the service provider is a large entity, 
it may maintain an auxiliary data base 1 40, such as CD- 

30 ROM data base, which communicates with host compu- 
ter 124 via data link 141 and I/O interface 136. Database 
140 could contain comprehensive customer information 
such as customer addresses and names, credit card ac- 
count numbers and purchase history indexed by tele- 

35 phone number. If the service provider is a small entity 
with a rapidly changing customer base, however, a lim- 
ited supplemental data base may be stored within host 
computer 124 (e.g., in RAM 128). 

Alternatively, some service providers may maintain 

40 a computer network (e.g. LAN 1 38), from which supple- 
mental data may be downloaded to host computer 124 
via data link 139 and I/O interface 136. 

In the above example, assume that the service pro- 
vider subscribes to a data service offered by the tele- 

45 communications network which maintains public switch- 
ing telephone network 107. The data service 142 in- 
cludes a data base in which it stores an electronic tele- 
phone directory including the telephone number and 
corresponding customer names of all residential tele- 

50 phones in the United States. In the above example, data 
service 1 42 sends digital data packets of information (e. 
g., a text of a customer's name) via telecommunication 
line 121 A to SSP 105. SSP 105 delivers the digital in- 
formation to host computer 124 via telecommunications 

55 line 1 21 B to I/O interface 1 36 so that a text of the cus- 
tomer name can be retrieved by host computer 1 24 and 
stored in RAM 128. If multiple names are retrieved (due 
to multiple possibilities of home telephone numbers or 
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multiple names associated with the telephone number), 
all possible names are provided to the host computer. 

The process continues in step 318 where, in re- 
sponse to a request received from CPU 202, the text(s) 
of the name retrieved from the supplemental data base 
are retrieved from RAM 1 28 and processed by TTS sys- 
tem 219 so that a phoneme transcription of the text is 
generated and stored as a speaker independent tem- 
plate in RAM 204. As shown in step 320, recognition of 
the caller's name as spoken by the caller (and stored in 
RAM 204) is attempted using speech algorithm 208, 
models of phonemes 210 and the speaker independent 
template(s) created in step 318. 

In determination step 322, a determination is made 
as to whether recognition occurred in the step 320. If the 
result of step 322 is a "YES" decision, the process con- 
tinues to step 324 where the transaction is authorized 
and the process terminates in step 326. If the result of 
step 322 is a "NO" decision, the process continues to 
step 328 where a phoneme transcription of the spelling 
of customer's retrieved name (as retrieved from the data 
base) is created by TTS system 21 9 and stored as sec- 
ond speaker independent template. In step 330, recog- 
nition of the spelling of the customer's name, as spoken 
by the customer is attempted using speech recognition 
algorithm 208, models of phonemes 210 and the second 
speaker independent template created in step 328. The 
process continues to determination step 332 where it is 
determined if the spelling of the caller's name was rec- 
ognized. If a "NO" decision is made in step 332, the proc- 
ess goes to step 306 where the call is routed to a live 
attendant and the process terminates in step 308. If a 
"YES" decision is made in step 332, the process contin- 
ues to step 324 where the transaction is authorized and 
the process ends in step 326 . 

The above example illustrates real time interactions 
among a customer using a telephone or a user interac- 
tive system, ASR system 110, host computer 124, and 
a supplemental data base. However, there may be other 
embodiments in which ASR system 110 is accessed by 
LAN 1 38 or embodiments in which customer names are 
recorded and stored in a database over a period of time 
and the data service provided by data base 142 is peri- 
odically accessed by ASR system 110 and host compu- 
ter 124. 

The method and system of the present invention 
achieves advantages over the prior art in that an ASR 
system can recognize arbitrary spoken words without 
speaker-specific training . It is to be understood that the 
above -described embodiments are for illustrative pur- 
poses only and that numerous other arrangements of 
the invention may be devised by one skilled in the art 
without departing from the scope of the invention as de- 
fined by the claims which follow. 



Claims 

1. In an automatic speech recognition (ASR) system 
having a first data base that stores word models and 

s correlation data on which word recognition deci- 
sions are at least partially based, a method of uti- 
lizing information stored in a supplemental second 
data base to enhance capability of the ASR system, 
the method comprising: 

10 

receiving an input from a user, said input having 
first and second portions; 
storing the input obtained from the user in the 
ASR system; 

* s the ASR system recognizing the first portion of 

the input retrieved from the user; 
identifying and retrieving supplemental infor- 
mation stored in the supplemental data base re- 
lated to the first portion of the input; 

20 creating a template derived from the informa- 

tion retrieved from the supplemental data base; 
and 

using the template to recognize the second por- 
tion of the input as spoken by the user. 

2S 

2. The method of claim 1 wherein the step of receiving 
an input from a user comprises the step of receiving 
a spoken telephone number and a spoken name 
corresponding to said first and second portions of 

30 the input, respectively. 

3. The method of claim 2 wherein the step of using the 
template to recognize the second portion comprises 
the step of using a speech recognition algorithm to 

3S recognize the spoken name. 

4. The method of claim 1 wherein the step of creating 
a template comprises the step of using a text-to- 
speech system to generate a phoneme transcrip- 

40 tion of the second portion of the input. 

5. The method of claim 1 wherein the step of receiving 
an input from a user comprises the step of receiving 
an index and a spoken utterance, corresponding to 

45 the first and second portions of the input, respec- 
tively. 

6. In a telephone network, a method of using a sup- 
plemental data base associated with an automatic 

so speech recognition (ASR) system to enhance the 
capability of the ASR system, the method compris- 
ing the steps of: 

receiving an incoming call from a caller; 
55 prompting the caller for first and second utter- 

ances; 

recognizing the first utterance spoken by the 
caller; 
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retrieving supplemental information stored in a 
supplemental data base using the first utter- 
ance as an index, wherein the supplemental in- 
formation is a written text; 
creating a speaker independent template of the 
written text as retrieved from the supplemental 
data base; and 

using the speaker independent template to rec- 
ognize the second utterance corresponding to 
the written text retrieved from the supplemental 
data base. 

7. The method of claim 6 wherein creating a template 
comprises using a text-to-speech system to gener- 
ate a phoneme transcription of the supplemental in- 
formation. 

8. The method of claim 6 wherein recognizing the sec- 
ond utterance comprises using a speech recogni- 
tion algorithm and models of phonemes. 

9. An automatic speech recognition (ASR) system for 
use in conjunction with a telephone network, the 
ASR system comprising: 

a calling station served by a switching service 
point; 

means for receiving an input; 
a telephone line interface unit for delivering the 
input received from the calling station to a ran- 
dom access memory of a host computer; 
a central processing unit in the host computer 
for retrieving supplemental information from a 
supplemental data base; 
a text-to-speech means for creating a phoneme 
transcription of the supplemental information 
retrieved from the supplemental data base; and 
a speech recognition means for recognizing an 
utterance associated with the supplemental in- 
formation. 

1 0. The ASR system of claim 9 wherein the supplemen- 
tal data base is maintained by a data service pro- 
vider. 

11. The ASR system of claim 9 wherein the supplemen- 
tal data base is stored in a CD-ROM. 

1 2. The ASR system of claim 9 wherein the supplemen- 
tal information is retrieved from a local area net- 
work. 

13. The ASR system of claim 9 wherein the phoneme 
transcription is used as a speaker independent tem- 
plate. 

14. The ASR system of claim 9 wherein the means for 
receiving an input is a caller ID service. 
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15. The ASR system of claim 9 wherein the means for 
receiving an input is a touch tone entry means. 

16. The ASR system of claim 9 wherein the speech rec- 
5 ognition means comprises an algorithm for compar- 
ing model phonemes to spoken utterances. 
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